Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Shapley.R #150

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

TomasZdrazil
Copy link

When creating y.hat.diff$feature.value, it takes the colnames of x.interest and just adds it as another column. However, the order of the colnames of x.interest may be different than the order of the same features in y.hat.diff$feature, therefore vlookup is needed instead of just appending the column. For this the auxiliaryTab is created that takes the feature names from the x.interest and then the merge function is used to assign the correct feature.value to the corresponding feature.

When creating y.hat.diff$feature.value, it takes the colnames of x.interest and just adds it as another column. However, the order of the colnames of x.interest may be different than the order of the same features in y.hat.diff$feature, therefore vlookup is needed instead of just appending the column. For this the auxiliaryTab is created that takes the feature names from the x.interest and then the merge function is used to assign the correct feature.value to the corresponding feature.
@christophM
Copy link
Collaborator

Thanks for this pull request.
The tests don't run through, it seems that now the Shapley values don't add up to the difference in the test, as the should.

@TomasZdrazil
Copy link
Author

Hi, thank you for your comment, I checked it and it seems like the Shapley values don't add up to the difference even by default, running the iml_0.10.1. Might that be an issue in your package? Not sure. See the code that I tested it attached. The R session info:
R version 3.6.1 (2019-07-05), Platform: x86_64-w64-mingw32/x64 (64-bit), Running under: Windows 10 x64 (build 18362)

ShapleySumTest.txt

@christophM
Copy link
Collaborator

It does add up, but only in expectation, meaning that when you increase the sample.size in Shapley$new, you will get closer to the difference.

The test for Shapley to add up can be found here: https://github.com/christophM/iml/blob/master/tests/testthat/test-Shapley.R

@TomasZdrazil
Copy link
Author

Thanks, I will have to look into that more deeply as for my data they do not add up and the gap is quite big, the actual difference is more than twice the sum of Shapley values, had sample.size = 3000.

Anyway, this request aimed to tackle other issue, and that is the fact that in case the order of columns in the training data (predictor$data$X) is not the same as in the record to explain (x.interest) the result is misleading, as the table shapley$results has the columns feature and feature.value with different values, e.g. for 1 line the feature specified in feature is not the same as specified in feature.value. This results for example in wrong visual, because shapley$plot uses feature.value as the label so the values of phi get visualised for wrong feature. Attached the script demonstrating this issue.

ShapleyColsOrderTest.txt

Can you confirm this behaviour? I guess the workaround is to order the columns manually for both datasets before running the Shapley values analysis, but I thought it would be more elegant to have this implemented in the function directly, as an user may not know this requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants